Dynamic load balancing of distributed SPMD computations with explicit message-passing
نویسندگان
چکیده
Distributed systems have the potentiality of becoming an alternative platform for parallel computations. However, there are still many obstacles to overcome, one of the most serious is that distributed systems typically consist of shared heterogeneous components with highly variable computational power. In this paper we present a load balancing support that checks the load status and, if necessary, adapts the work-load to dynamic platform conditions through data migrations from overloaded to underloaded nodes. Unlike task migration supports for task parallelism and other data migration frameworks for master/slave-based parallel applications, our support works for the entire class of SPMD regular applications with explicit communications such as linear algebra problems, partial diierential equation solvers, image processing algorithms. Although we considered several variants (three activation mechanisms, three load monitoring techniques and four decision policies), we implemented only the protocols that guarantee program consistency. The eeciency of the strategies is tested in the instance of two SPMD algorithms that are based on the PVM library enriched by special-purpose prim-itives for data management. As additional contribution , our research keeps the entire support for dynamic load balancing transparent to the programmer. Even if the technical details are out of the scope of this paper , we point out that the only visible interface of our support is the activation phase.
منابع مشابه
Dynamic Load Balancing in a Message Passing Virtual Parallel Machine Dynamic Load Balancing in a Message Passing Virtual Parallel Machine
In this paper we will look into the problem of dynamic balancing of tasks in a het-erogenous parallel computing environment. The parallel programs are assumed to be executed in the Single Program Multiple Data (SPMD) style. The criteria for re-balancing the load are discussed; the eeect of data movement required in the load balancing is considered; and novel algorithms of dynamic load balancing...
متن کاملFLEX-MPI: An MPI Extension for Supporting Dynamic Load Balancing on Heterogeneous Non-dedicated Systems
This paper introduces FLEX-MPI, a novel runtime approach for the dynamic load balancing of MPI-based SPMD applications running on heterogeneous platforms in the presence of dynamic external loads. To effectively balance the workload, FLEX-MPI monitors the actual performance of applications via hardware counters and the MPI profiling interface—with a negligible overhead and minimal code modifica...
متن کاملTPVM: Distributed Concurrent Computing with Lighweight Processes
The TPVM (Threads-oriented PVM) system is an experimental auxiliary subsystem for the PVM distributed system, which supports the use of lightweight processes or \threads" as the basic unit of parallelism and scheduling. TPVM provides a library interface which presents both a traditional, task based, explicit message passing model, as well as a data-driven scheduling model that enables straightf...
متن کاملLoad balancing of irregular parallel divide-and-conquer algorithms in group-SPMD programming environments
We study strategies for local load balancing of irregular parallel divide-andconquer algorithms such as Quicksort and Quickhull in SPMD-parallel environments such as MPI and Fork that allow to exploit nested parallelism by dynamic group splitting. We propose two new local strategies, repivoting and serialisation, and develop a hybrid local load balancing strategy, which is calibrated by paramet...
متن کاملTechnische Universität Chemnitz Sonderforschungsbereich 393 Numerische Simulation auf massiv parallelen Rechnern
The characteristics of irregular algorithms make a parallel implementation difficult, especially for PC clusters or clusters of SMPs. These characteristics may include an unpredictable access behavior to dynamically changing data structures or strong irregular coupling of computations. Problems are an unknown load distribution and expensive irregular communication patterns for data accesses and...
متن کامل